home *** CD-ROM | disk | FTP | other *** search
- THE LONG ROAD AHEAD
- Difficulties in Machine Intelligence
-
-
- by
- Peter Dudey (pdudey@willamette.edu, hagbard on the IGS)
-
-
- a Willamette University
- Undergraduate Research Grant
-
-
-
- 28 August 1992
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ACKNOWLEDGEMENTS
-
- I would like to thank the University for funding me in this
- work, and Professor Jim Levenick for advising me. I was greatly
- aided by the insights of David Fotland and the other programmers at
- the North American Computer Go Championship, and the Go players
- on the Internet Go Server who tested an commented on Fumiko. All
- those involved in setting up the server and maintaining the Go
- directory at the University of Washington were also helpful.
- Finally, I would like to thank Ian Osgood for joining me in my
- continued work.
-
- GENERAL OUTLINE
-
- I have spent the past three months applying machine learning
- techniques to the Japanese game of Go. I have learned a great deal
- about Go, programming in general, Go programming in particular, and
- intelligence. My insights on these last two topics are delineated in
- "On Intelligence And Learning."
- I have produced a working Go program, "Fumiko", which can be
- found on the public Go directory at the University of Washington
- computer system (milton.u.washington.edu on the internet). The
- program is somewhat competent, and defeated David Stoutamire's
- "Goprogram" in an untimed game over the internet. Fumiko is rather
- slow, though, and lost all of her rounds at the North American
- Computer Go Championship, held here in August at the U.S. Go
- Congress, by running out of time. (I gave Fumiko a feminine name,
- and use such pronouns, because she would usually play black, and by
- Taoist symbolism black is feminine.) Fumiko was never able to
- learn; this turned out to be too massive a problem, for reasons that
- will be explained in the section entitled "Why Is This So Difficult?".
- Although Fumiko was only completed in early August, I have
- learned so much, in writing her and at the Congress, that in
- retrospect she seems primitive and clumsy. (By comparison with the
- other programs in the Championship, she was.) Although the period
- of this grant is ending, I am continuing my work: I am working with
- Ian Osgood, a programmer from Beaverton, on a new program,
- "HellGo". This is discussed in greater detail in "Future Work."
-
- ON INTELLIGENCE AND LEARNING
-
- In the machine learning class I took from Professor Levenick
- last semester, a great deal of time was spent on the question of what
- it means to be intelligent. I believe intelligence involves the ability
- to ask oneself and answer three questions:
- 1) What's going on?
- 2) What should I do?
- 3) How should I do this?
- The third question may be applied to itself recursively. If a
- person decides they want to get rich by selling widgets, they must
- ask themselves how they are going to sell them. If they decide to
- sell them by advertising in the back of Soldier of Fortune magazine,
- they must consider how to go about placing such an ad.
- For a program to learn, it must be able to answer an additional
- question:
- 4) What effect will this have?
- When this question is answered, a prediction is being made,
- and this allows for use of the scientific method: make a prediction,
- test it (by performing the action decided upon), correct the
- prediction to whatever extent it didn't hold, and repeat the process.
- Correcting the prediction is the essence of learning. If a person
- or program keeps making the same wrong prediction, they are
- certainly not learning. How to make such a correction, however, is a
- diabolically difficult problem.
- The primitive learning algorithms we implemented in the
- machine learning class had it fairly easy; after they chose an action
- (implicitly predicting that this was the right answer), we would tell
- them what they should have done. If they got it wrong, they would
- alter their action-choosing mechanism to be more likely to choose
- that answer if faced with the exact same problem. This learning
- would generalize somewhat, in that the programs would usually then
- respond in a similar way to similar problems.
- In Go, things are not so simple, for two reasons: we don't know
- the "right" answer, and the situation is so complex that it's difficult to
- tell exactly what went wrong.
- In a simple game like Tic-Tac-Toe, it may be said that a move
- is right or wrong: correct moves lead to victory (or at least a tie),
- incorrect moves to defeat. Most humans, and simple computer
- programs, can work out every possible sequence of moves up to the
- end of the game. If a move will lead to the player winning
- regardless of how the opponent responds, it may be said to be the
- correct move.
- A human cannot do this with a game like Chess. There are
- simple too many possibilites--there are roughly 10^450 possible
- games.
- One may just try to "look ahead" further than the opponent,
- and make moves which will lead to a good board position. This,
- however, depends on being able to accurately judge how good a
- board position is. If one is going to consider three moves that each
- player might make, for each of the next ten turns, one must perform
- 310 such judgements.
- The best Chess programs do work by brute force: looking ahead
- many more moves than the opponent and using special-purpose
- hardware to make simple judgements of how good a board position is
- (e.g., counting pieces).
- This approach does not work in Go. The number of possible
- games is vastly higher (10^750), there are more legal moves in a turn
- (several hundred, compared to a dozen or so in Chess), there are
- more moves in a game (around 300, compared to about 40), and
- judging a board position is much more difficult. (This last has to do
- with the more significant interplay between pieces in Go.)
- Not only can we not tell a Go program what the right answer is,
- deciding on a move is a sufficiently complicated process that it is
- difficult to say exactly what went wrong.
- By way of analogy, suppose a woman walks into the street and
- is hit by a truck. She survives, and when she awakes in the hospital
- her husband tells her, "I hope you learned your lesson!"
- What is the lesson to be learned? Why did she fail in her task
- of not being hit by a truck? Did she fail to look both ways before
- crossing the street? Did she look, but not see the truck because she
- wasn't wearing her glasses? Was her timing off in running ahead of
- or behind the truck? Did she fail to put on high-visibility clothes
- that morning? Was it not her fault at all, but that of the truck
- driver? Was it some combination of these? Is her husband correct
- in telling her to change her behavior, or does he think the accident
- was divine retribution for some otherwise unrelated action?
- If something bad happens in a Go game (a group is captured or
- the game is lost), there is often not one particular move to blame.
- Also, when a bad move is made, the consequences may not be
- immediately apparent, as with the woman choosing not to wear her
- Day-Glo vest that day and then being flattened several hours later.
- Even if a bad move can be spotted, it is not always clear why it
- was bad. It may have been a bad way to attempt to save a group, or
- it may have saved the group but nonetheless been a bad move
- because it ignored some other, more urgent situation elsewhere on
- the board. Finding the weak link(s) in a chain of reasoning is
- difficult even for humans.
- Finally, even if we can find out what went wrong and why it
- was wrong, current computer programs are not good at figuring out
- how to change the offending behavior. The approach being most
- rigorously explored right now involves telling the program to, in the
- future, rely more on those parts that tend to offer good advice. This
- does not create any new solutions to the lowest-level problems, but
- makes the best possible use of those techniques the program has.
- Fortunately, in a limited realm such as a game, there is a level
- at which the program doesn't need to find new ways to do things,
- and there are things it will always want to do. The rules of the game
- provide a finite list of possible ways to try to do anything (the legal
- moves), and the overriding goal is always to win the game. There
- would be no point in the program being able to override these limits.
- In larger contexts, such as life itself, it can be argued that there are
- no such limits; there are always new ways to do something, and we
- haven't found any places to stop asking, "Why?" A program that did
- not have upper and lower limits on these chains of reasoning would
- not merely be intelligent, it would be sentient.
-
-
- WHY IS THIS SO DIFFICULT?
-
- One thing I have gained from this project is an appreciation of
- the sheer immensity of the problem of computer Go. Playing the
- game has provided careers for generations of professionals; the best
- programs, which are over 80,000 lines long and encompass decades
- of person-years of work, play only as moderate amateurs.
- Why, one might wonder, is Go so difficult to program? Why
- can't we just ask a master player how they choose their moves, and
- then use the computers calculating power better and faster?
- The biggest reason is that good players don't know why they
- choose the moves they choose. If asked, they will often respond, "It
- just looked right."
- I have found three particular obstacles, things which even
- begining human players do better than any extant programs:
- grouping, spotting salient details, and making intuitional judgements.
- By grouping, I mean recognizing groups of stones which are
- near each other and not cut apart by enemy stones. Even an
- amateur can perform this task instantaneously and with a fairly high
- degree of accuracy.
- This is difficult to program because of the vagueness of such
- concepts as "near". In Go, pairs of stones can be more distant than
- individual stones and still be in the same group, and enemy stones in
- the area have an effect which is hard to explicitly define. Even if a
- threshold distance is known, finding the distance between two stones
- would involve the Pythagorean theorem; this only takes a fraction of
- a second, but if it must be done several thousand times a turn it can
- be cripplingly slow.
- People are also very good at spotting things which are
- important or different. Although we ignore most of the cars we see
- on the road, we could spot a police car instantly. The most efficient
- way for a computer to spot it would be to check every car seen. It is
- not clear whether humans are doing this subconsciously, but if we
- are we are simultaneously doing that and thousands of other things
- (checking for odd smells, looking for anyone we know, listening to
- the radio, watching street signs . . . ) at amazing speed.
- Within a field, this ability to jump straight to the important
- part is what makes for expertise. A musician can immediately pick
- out the melody in a song, ignoring the other notes. An English
- teacher can spot spelling errors in a ten-page essay; if he is checking
- every word against some internal dictionary, he's doing it fast
- enough that he still has time to comprehend the essay and look for a
- thesis. A Go player needn't try to defend a group that isn't
- threatened, and doesn't consciously check each group every turn.
- Because we don't process (or process very quickly) most of the
- information that comes our way, we can make snap judgements,
- coming up with a good answer without having complete information.
- For example, if a person unwittingly bites into a jalapeo, they
- can instantly grab a glass of water and quaff it. From a programming
- standpoint, this involves going through the list of items on the table,
- comparing the quenching potential of water with that of other
- available beverages, considering whether the clear liquid is in fact
- hydrochloric acid, and so on. Whether the person is not doing this or
- is doing it very quickly is not clear.
- One type of snap judgement particularly useful in Go is the
- ability to spot the few good moves. After a few seconds, a player
- will have discarded all but two or three of the hundreds of legal
- moves. Again, they are not consciously going through each possible
- move; they are thinking, "I either want to do this, or that, or that."
- Here, perhaps, is an area where computers have the advantage: a
- person could not remember analyses of fourteen possible moves.
- In writing Fumiko, I spent most of my time on the sequencer,
- the part which would read out possible sequences of plays. At the
- Congress, Dave Fotland pointed out that this was not a good idea.
- Reading out sequences is very time-consuming, and time spent
- considering consequences of dumb moves is essentially wasted. Most
- of the better programs concentrate on selecting moves to consider. If
- a program can spot a half-dozen good moves intuitionally, even
- playing one of those at random is not likely to mess up the game
- horribly. When professional Go players take on several opponents at
- once, they may be doing something akin to this.
- While they are much faster, snap judgements are not as
- accurate as complete analysis, and can lead to closed-mindedness. If
- the best answer is unorthodox, it may never be considered, may be
- thrown out with the bad answers. Often, as in the jalapeo example,
- we don't have time to perform a complete analysis; other times, we
- don't have enough information on the topic (as in judging a person),
- and we really don't have time to fully analyze all of the information
- we get, which is why we must ignore so much of it. An intelligence
- must walk a fine line; too much analysis takes too long, and too little
- may miss the best answers--especially if our intuitions are flawed.
-
- FUTURE WORK
-
- As mentioned in the introduction, this will not be the end of
- my work in computer Go. Work on "HellGo" has commenced, and it
- will feature several improvements over Fumiko:
- First, and perhaps most importantly, the data structures will be
- smaller, faster, and more efficient. This is made possible by the
- programming experience I have gained working on Fumiko, and on
- the advantage of having a collaborator. To give just one example, the
- structure used to store the locations of the stones on the board is
- nearly six times smaller than it used to be, and appears to be a little
- faster.
- We will not ignore the grouping question, as I did with Fumiko.
- Already, HellGo has a loose concept of groups, and we will be
- improving this, writing sections to detect whether stones or chains of
- stones are connected.
- More emphasis will be put on the move selector, the part which
- decides which moves should be considered in reading out sequences.
- These parts in Fumiko were rather sloppily thrown together as the
- Congress approached, and she relied on the sequence reader--a bad
- idea for reasons explained earlier.
- We do hope to get HellGo to learn, or at least be teachable.
- Currently, I envision a program where, if it made a bad move, the
- human opponent could ask it why it did so, and then point out the
- weak link in a displayed chain of reasoning.
- Finally, we will be spending more time on this program than I
- was able to spend on Fumiko; it may be several years before it is
- completed. We intend to seek further funds from other sources to
- continue our work. In any case, I for one will be doing this in my
- spare time for the forseeable future.
-